Letter Level Learning for Language Independent Diacritics Restoration

نویسندگان

  • Rada Mihalcea
  • Vivi Nastase
چکیده

å Ü à ß Ö × Ó ae ç è é ê ë ì í î ï ð î ï ñ é î ñ í ò ó ë ï ò ô õ í ê í õ ö è ï ÷ ñ ð ô ò ø ô ò ù ø í ë ï ð ú ë ï í ù ø î ù õ ë í ò ï ë ù ø î û í õ ô î ó ê ô ù ô ó ò ê í ò ù ë ê î ù ô ë ï ì ê ë ü ð í ý ò þ ÿ ø í í ¡ ¡ ë ê õ ò ù ø î ù é ò í ò ì í ÷ ó ô î ð ó ø î ê î ó ù í ê ò î ê í ü ë ê ê ë ¡ í õ ê ë ý ë ù ø í ê ð î ï ñ é î ñ í ò

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instant Diacritics Restoration System for Sindhi Accent Prediction using N-Gram and Memory-Based Learning Approaches

--The script of Sindhi Language is highly complex due to many complexities including abundance of homographic words. The interpretation of the text turns so tough due to the possibility of multitudinal meanings associated with a homographic word unless given specific pronunciation with the help of diacritics. Diacritics help the readers to comprehend the text easily. Due to the rapidly developi...

متن کامل

Lexical Disambiguation of Igbo through Diacritic Restoration

Properly written texts in Igbo, a low resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in cor...

متن کامل

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yor\`ub\'a Language Text

Yorùbá is a widely spoken West African language with a writing system rich in tonal and orthographic diacritics. With very few exceptions, diacritics are omitted from electronic texts, due to limited device and application support. Diacritics provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any Yorùbá text-to-speech (TTS), automatic spee...

متن کامل

A robust diacritics restoration system using unreliable raw text data

Statistical language models are utilized in many speech processing algorithms, e.g., automatic speech recognition (ASR). Such a model is created from a text corpus, but many of the text corpora for Romanian are unreliable with respect to the use of diacritic marks, i.e., diacritics are either partially or completely missing, resulting in low quality language models. We present a methodology for...

متن کامل

Higher Order n-gram Language Models for Arabic Diacritics Restoration

Dynamic programming based Arabic diacritics restoration aims to assign diacritics to Arabic words. The technique is purely statistical approach and depends only on an Arabic corpus annotated with diacritics. The possible word sequences with diacritics are assigned scores using statistical n-gram language modeling approach. Using the assigned scores, it is possible to search the most likely sequ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002